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Abstract —We study statistical restricted isometry, a property 
closely related to sparse signal recovery, of deterministic sensing 
matrices of size m x N. A matrix is said to have a statistical 
restricted isometry property (StRIP) of order k if most subma¬ 
trices with k columns define a near-isometric map of into 
R™^. As onr main resnlt, we establish snfficient conditions for the 
StRIP property of a matrix in terms of the mutual coherence 
and mean square coherence. We show that for many existing 
deterministic families of sampling matrices, m = 0(k) rows 
snffice for fc-StRIP, which is an improvement over the known 
estimates of either m — 0{klogN) or m = 0(fclogic). We also 
give examples of matrix families that are shown to have the StRIP 
property using our sufficient conditions. 


I. Introduction 
A. RIP matrices and binary codes 

We study conditioning properties of subdictionaries moti¬ 
vated by the problem of faithful recovery of sparse signals 
from low-dimensional projections. A universal sufficient con¬ 
dition for reliable reconstruction of sparse signals is given by 
the restricted isometry property (RIP) of sampling matrices 
E). It has been shown that sparse high-dimensional signals 
compressed to low dimension using linear RIP maps can be 
reconstructed using F minimization procedures such as Basis 
pursuit and Lasso ifT^ . iflTll . ifTSll . ifT^ . 

Let X be an A-dimensional signal and denote by [A] = 
{1,2,..., N} the set of coordinates. Below we use <I> to denote 
the mx N sampling matrix and write $/ to refer to the mxk 
submatrix of <I> formed of the columns with indices in I, where 
I = {ii,... ,ik} C [A] is a fc-subset of [A]. We say d) is 
(k, (5)-RIP if every k columns of $ satisfy the following near¬ 
isometry property: 

||$f$7-Id||2<<5 (1) 

where Id is the identity matrix, and || • ||2 is the spectral norm 
(the largest singular value). 
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It is known that a /c-RIP matrix must have at least m = 
n(fclog(A/fc)) rows ll32]| . ll^ . Moreover, if x is compressed 
to a sketch y — ^x of dimension m, then m = Vl{k log(A/fc)) 
samples are required for any recovery algorithm to provide an 
approximation of the signal with an error guarantee expressed 
in terms of the F or £2 norm ll^ . ll^ (this bound applies 
to signals which are not necessarily fc-sparse). Matrices with 
random Gaussian or Bernoulli entries with high probability 
provide the best known error guarantees for recovery from 
sketches of dimension m that matches this lower bound lfT9l . 

Eol, ca. 

Let = \{(j>i,(l)j)\ be the coherence between columns i 
and j and denote by p := max^^j fiij the mutual coherence 
parameter of the matrix $. The relation between the mutual 
coherence and RIP has served the starting point in a number 
of studies on RIP matrix construction E], na. One way 
of constructing incoherent dictionaries begins with taking a 
binary code, i.e., a set C of binary m-dimensional vectors. We 
say that the code C has small width if all pairwise Hamming 
distances between distinct vectors of C are close to m/2. For 
instance, if m/2 — w < d{xi,Xj) < m/2 + w for every 
Xi,Xj G C,Xi ^ Xj, we say that the code has width w. A 
real sampling matrix can be generated from a small-width 
binary code by mapping bits of the codewords to bipolar 
signals according to 0 —1,1 ——1. The resulting vectors 
are normalized to unit length and written in the columns of 
the matrix d). The coherence parameter /i(<h) of the matrix 
and the width of the code C are connected by the obvious 
equality w{C) = /i($)m/2. 

One of the first papers to put forward the idea of construct¬ 
ing RIP matrices from binary vectors was Il24ll . While it did 
not make a connection to error-correcting codes, a number 
of later papers pursued both its algorithmic and constructive 
aspects 161, lfT3ll . lfT4l . Il23l . Examples of codes with small 
width are given in 121, where they are studied under the name 
of small-bias probability spaces. RIP matrices obtained from 
the constructions in El satisfy m = O( iog(iog^) )^- ISl 
these results were recently improved to m = for 

(log A)“^/^ < p. < (log A)“^/^. The advantage of obtaining 
RIP matrices from binary or spherical codes is low construc¬ 
tion complexity: in many instances it is possible to define 
the matrix using only 0(log A) columns while the remaining 
columns can be computed as their linear combinations. We 
also note a result of m that gave the first (and the only 
known) construction of RIP matrices with k on the order of 
(i.e., greater than 0{^/m)). An overview of the state of 
the art in the construction of RIP matrices is given in a recent 
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paper ||5]. 

Taking the point of view that constructions of complexity 
0{N) are acceptable, the best tradeoff between m, k and N for 
RIP-matrices based on codes and mutual coherence is obtained 
from Gilbert-Varshamov-type code constructions ; namely, 

it is possible to construct (fc,(5)-RIP matrices with m = 
4(A:/i5)^ log At the same time, already the results of ^ 
imply that the sketch dimension in RIP matrices constructed 
from binary codes is at least m = 0((fc^ logN)/ logfc). 


B. Statistical RIP (StRIP) matrices 

Constructing deterministic RIP matrices or verifying that a 
matrix satisfies the RIP is a difficult problem. For this reason in 
order to approach the optimal sketch dimension 0{k logiV/fc) 
we focus on the following probabilistic relaxation of definition 

O- 

Definition 1.1 (Statistical Restricted Isometry Property): 
Let $ be an m X real matrix, where m < N. Suppose 
that I C N,\I\ = k is chosen uniformly at random from [A^]. 
Then $ is said to have the (fc, <5, e)-StRIP if 

P(||$f$/-Id||2 > 5) < e. 


Except for the name, the StRIP is by no means new in the 
literature. Tropp ll44l showed how StRIP and a condition on 
the so called local 2-cumulative coherence 

P2iT) = uiax[J2 

jer 


can support sparse recovery of a class of signals. Candes 
and Plan ESI used the same technique to prove almost exact 
recovery for the Lasso estimator. 

StRIP is a property of interest in its own right, apart from 
applications in sparse recovery. Indeed, papers such as M 
are entirely devoted to bounds on the largest singular value of 
a random collection of columns from a general dictionary. The 
recent paper states that StRIP is “of great potential interest 
for a wide class of problems involving high-dimensional linear 
or nonlinear regression models.” 0 goes on to investigate 
sufficient conditions for StRIP based on the mutual coherence 
of the matrix 4>. 

The goal of this paper is to broaden the class of StRIP 
matrices by establishing a sufficient condition that relies 
upon easy-to-verify parameters of sampling matrices. In this 
vein, we introduce a new parameter called the mean square 
coherence 


1 


max 


l<j<N N — 1 


N 


Em 


2 

i,j- 




In many cases, as we will see below, calculations with the 
mutual coherence parameter can be too pessimistic. In this 
paper we combine the mean square and mutual coherence 
parameters to relax the requirements on camping matrices. 

Intuitively, the mean square coherence parameter is easier 
to control than ^($). Note that if the matrix <I> is coherence- 
invariant (i.e., the set Mi := {/iy , j € [-^]\0 independent 
of i), then pf can be computed for any given without find¬ 
ing the maximum. Observe that most known constructions of 


sampling matrices satisfy this property. This includes matrices 
constructed from linear codes ED, in, chirp matrices and 
various Reed-Muller matrices a, E 3 , as well as subsampled 
Fourier matrices ED. 

The main contribution of this paper is the derivation of 
new sufficient conditions for the StRIP property of sampling 
matrices, stated in Theorem 12.11 The proof of this theorem 
is based on considering the mean square coherence p? and 
on detailed analysis of statistical incoherence of sampling 
matrices. The sufficient conditions that arise are 1) phrased 
in terms of coherence p and p^, 2) easy to verify and 3) 
analytically easy to evaluate for many known families of 
sampling matrices. We show that our results are better than 
the estimates known in the literature for a range of the sparsity 
and the signal dimension that satisfy conditions discussed in 
Sec. lII-Bl In general, Theorem l2. 1 [ extends the currently known 
region of sufficient conditions for StRIP matrices, and for 
many standard sampling matrices, ensures that m = 0{k) 
rows suffice for /c-StRIP, which is an improvement over the 
known estimates of m = Q{k\ogN). 

Application of our results to some deterministic matrices 
popularized in recent literature on sparse recovery, for in¬ 
stance, the Delsarte-Goethals matrices ES, El, shows that 
the statistical RIP property is fulfilled for a smaller sketch 
dimension m than previously known. We also estimate the 
dimensions of many other known families of matrices, deriving 
sufficient conditions for the statistical RIP property. Since 
the StRIP and statistical incoherence properties suffice for 
stable recovery with Basis Pursuit, our results, in turn, provide 
sufficient conditions for sparse recovery for many families 
of sampling matrices. A more detailed discussion and some 
further applications of our results appear in an earlier version 
of this paper in arXiv 0. 


IT Main result and discussion 
A. Main result 

Theorem 2.1: Let <I> be an m x W matrix. Let e < 
min{l/A:,and suppose that <I> satisfies 

1 . / (1 — a)^b^ 


kp. < 


\og\l/e) 


lin ^ 


321og(2A:) log(e/e) ’ 


and kp^ < 


ab 


log(l/e)’ 

where a,b,c G (0,1) are constants such that 

Olc 

•/a -I- v/2a6 + v/c + — ||'i)|p < jQsfi. 

Then <P is (fc, (5, e)-StRIP. 


( 2 ) 

(3) 

(4) 


B. Comparison to earlier work 

Most relevant to our results are two papers by Tropp ll43l . 
il. The first of them proved a nearly optimal sufficient 
condition for StRIP using mutual coherence and matrix norm, 
namely that $ is (fc, S, e)-StRIP if 


= 0((logA^)-i) and ||$f = O 


N 


(5) 


fclogW 
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where the constants that depend on 6 are absorbed into O(-). 
For the above result to hold, e has to be less than 1/fc, just as 
in Thm. 12.11 above. The restriction on p is very mild, while 
the condition on ||<1>|| can be further improved. Namely, El 
shows that the conditions 


the code vectors as columns of the matrix and replacing 0 
with l/i/m and 1 with we obtain the following 

parameters: 

TO = N = 2-’'to’’+2, ^ = 2’'to-V= (9) 


p = 0((/clogfc)-i/2) and ||$f = O (6) 

suffice for the {k, S, e)-StRlP property. Note that the improve¬ 
ment for II $11 in (|6]l over (|5]l is obtained at the expense of 
tightening the condition on the coherence. For this reason, 
conditions (|5]l are better suited for verifying the StRIP property 
of deterministic matrices. 

Equations © and (|6]l together define the currently known 
region of sufficient conditions for StRIP matrices. The con¬ 
tribution of Theorem 12.11 is to further extend this region by 
including matrices that satisfy 

p = 0((/clogA:)-i/^), = 0{l/k) and ||$f = O 

We can claim an improvement over the results of ||43]| when 
inequality (I?]) is better than Q (in the sense that a smaller 
value of TO is required for the conditions to be satisfied). 
Most known examples of deterministic sampling matrices, 
including the examples in Sect. |IV] below, have mean square 
coherence of order /i^($) = 0{^), coherence /i = 
and spectral norm ||$ ||2 < R. Hence the most restrictive 
constraint of the three conditions in (|7]i is the last one, and 
0 essentially reduces to the constraint to = 0 (fc) for many 
standard sampling matrix families. On the other hand, 0 
reduces to the constraint to = 0(fc log N) for the same reason. 
Note that the most restrictive condition in 0 is the first one 
which gives rise to the constraint to = 0(fclogfc) for the 
sampling matrices of Sect. |IV] 

The sufficient condition on the coherence p implied by 0 
is 

p = 0((fclogfc)-i/4)^ (g) 

which by itself is an improvement over the coherence con¬ 
dition of 0 if fclogfe = O(log^iV). In the next subsection 
we discuss a concrete family of sampling matrices for which 
our results yield better parameters than the conditions known 
previously. 

Apart from this, we also note that imposing the StRIP 
condition together with the statistical incoherence condition, or 
SINC (defined below), suffices to prove stable sparse recovery 
by Basis Pursuit. This observation, which is an extension of 
known results, is included in the Appendix. We list examples 
of dictionaries that meet the StRIP and SINC conditions in 
Sect. HV] 



C. Example: Delsarte-Goethals codes 

A class of sensing matrices that satisfy the condition of 
Theorem 12.11 comes from a family of binary codes called the 
Delsarte-Goethals codes which are certain nonlinear subcodes 
of the second-order Reed-Muller codes; see l|35], Ch. 15. 
Suppose that the length of the chosen code is to. Writing 


where s > 0 is any integer, and where for a fixed s, the 
parameter r can be any number in {0,1,..., s— 1}. If we take 
s to be such that s-l-1 is divisible by 3 and set r = (s-f l)/3, 
then we obtain, 

m = 2®^ N = ^ = 2-2’- = 773 -^/^ 


An easy calculation that relies on the Pless identities for binary 
codes (e.g. Il35l p.l32]) shows that 


N — m 1 
m{N — 1) TO 


( 10 ) 


Using the properties of the Delsarte-Goethals codes, it is 
easy to see that the norm of the sampling matrix $ is 
II$11 = y/N/m. Employing condition 0, we observe that 
TO = O(fclogfc) samples suffice for this matrix to sat¬ 
isfy the (fc, (5, l/fc)-StRIP condition while 0 requires to = 
0{k\og N). If TO is fixed as above, this implies that using our 
results we can claim the StRIP property for larger k that was 
previously known. 


III. Proof of the main result 

A. Notation 

Let $ be denote the mxN real sensing matrix with columns 
of unit norm. By CPfc(A^) we denote the set of all fc-subsets of 
[N]. The usual notation for probability Pr is used to refer a 
probability measure when there is no ambiguity. At the same 
time, we use separate notation for some frequently encountered 
probability spaces. In particular, we use to denote the 
uniform probability distribution on Tfc(7V). We also use Pr'^ 
to denote the uniform distribution on the set R'^ := {(/,j) : 

|/| = fc,/c [7V],j eP}. 

To express our results concisely we introduce the following 
concept. 

Definition 3.1: An m x N matrix $ is said to satisfy a 
statistical incoherence condition (is (fc, a, e)-SINC) if 

PRkiU G : rnaxi^/ ll^f^illi <«})>!-£• (H) 

This condition is discussed in ll^ . ll42l . and more explicitly in 
El . Following ll4^ , it appears in the proofs of sparse recovery 
in M and below in this paper. 

The reason that (HUl is less restrictive than the constraint 
on the coherence parameter /r($) is as follows. The columns 
of $ can be considered as points in the real projective space 
MP"*“i. Recall that /r($) = mini^j The columns 

of a matrix $ with small /i($) form a packing of the space 
with large pairwise separation between the points. Such a 
packing cannot contain too many elements so as not to 
contradict universal bounds on packings of RP™”^. At the 
same time, for the norm ||$J’ 0 i ||2 to be large it is necessary 
that a given column is close to the majority of the k vectors 
from the set I, which is easier to rule out. 
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B. Sufficient conditions for statistical incoherence properties 

We begin with establishing a sufficient condition for the 
SINC property in terms of the coherence parameters of 
This result is not necessarily stronger than the result of ||43]| . 
but is essential in proving our main theorem. 

Theorem 3.1: Let $ be an to x TV matrix with unit-norm 
columns, coherence p, and mean square coherence p?. 


4. 

M — 


32fc(log2Af/e)3 


and p^ < 


aj3 


fclog(2iV/e)’ 






k 


= Er, (^ I L,,i, r,.2,..., L,,t), f = 1,2,..., fc. 


For i = 1,..., A:, let 


^ I L,. 1, y,.2 ,..., , 

^ 1^1 


Z=1 




Next, 

E{Zt I Zo, ^1, -.., Zt-i) = Zt-i + I ^0, 2'i,..., Zt^i) 

k 

+ e(e( ^ Y,,i\Tt) |Zo,...,yt-i) 

l=t+l 

k 




( 12 ) 


-Zt-i + E(yj, 


t I Zo, • ■ •, Zt-i 


where /3 > 0 and 0 < a < 1 are any constants. Then $ has 
the (fc,a, e)-SINC property with a = (3/ \og(2N/e). 

Before proving this theorem we will introduce some nota¬ 
tion. Fix j G [A^] and let Ij = {* 1 , 12 , ■ • ■ ,*fc} be a random 
fc-subset such that j ^ Ij. The subsets Ij are chosen from the 
set [TV]\j with uniform distribution. Define random variables 
Yj,i = b] = 1, • ■ •, fc- Next define a sequence of random 
variables Zj t, f = 0,1 ,..., fc, where 


+ e( ^ Yj^i I Zo, ■ ■ ■, Zt-i 

i=t+i 

k 

-E(^y,-, |yo,...,yt-i) 

i—t 

= Zt-i, 

which is what we claimed. 

Next we prove a bound on the random variable \Zt — Zt-i |. 
We have 

k k 

\Zt-Zt_i\ = ' 




Z=1 


Z=1 


< max 

a,6 


K 

\ J^t-i,Yt,i = a) 


Z=1 

k 


- E 




1=1 


where is defined in Section UlI-AI 

Let us show that the random variables Zt form a Doob 
martingale. Begin with defining a sequence of cr-algebras 
lFt,t = 0,1 ,..., fc, where Tq = {0, [N]} and Ft,t > 1 
is the smallest cr-algebra with respect to which the variables 
Yj^i ,..., Yj^t are measurable (thus, Ft is formed of all subsets 
of [N] of size < f-l-1). Clearly, J^o C C • • • C Fk, and for 
each f, Zt is a bounded random variable that is measurable 
with respect to Ft- Observe that 

k k 

Zo = Ej-Zj-.o = E ^^lil = E <kp?- (13) 


= max 


l = CL 


nax I ^ (^(yi.z I ^t-i,Yt^. 

- e(Yj,i I Ft-i,Yt,i=bj'j 

k 


= max 

a,6 


a-fc+ E (^(yi.i I-^t-ijyt,; = a) 

z^t+i 

-E(y,-, \ Ft-i,Yt,i = 


< 


= 2p^ 


i=t+i 

N-2 


The next two lemmas are useful in proving Theorem 13. II 
Lemma 3.2: The sequence {Zt, Ft)t=o,i,...,k 
forms a bounded-differences martingale, namely 
E_r'j. (Zt I Zq, Zi,..., Zf_i) = Zt_i and 

\Zt - Zt-i \ < 2p^(l -I- ^ ^ _ 2 ) ’ f = l,...,fc. 

Proof: In the proof we write E instead of E/j/^. We have 

k t k 


N-k-2' 


Proposition 3.3: (Azuma-Hoeffding, e.g., ESI) Let 
Xq, ..., Xk-i be a martingale with \Xi — Xi-i\ < m for 
each i, for suitable constants at. Then for any z/ > 0, 

k-l 


Pr (I E("^i “ ^i-i) >J^)<2exp 




2E< 


l=t+l 

k k 

= Zt_i + y,.t + e( E I J^t) - e(^ y,,11 Ft-i). 


Z=i+1 


l—t 


Proof of Theorem 13.71 - Bounding large deviations for the 
sum I 'Zl=i{Zt - Zt-i)\ = \Zk - Zol, we obtain 

P,(|Z.-Zo|>.)<2exp(- ^^-^(^) -). (14) 

where the probability is computed with respect to the choice 
of ordered {k -f l)-tuples in [TV] and z/ > 0 is any constant. 
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Using ( fTST l and the inequality (N — 2)/ (N — k — 2) <2 valid 
for all fc < Y ~ Ij we obtain 

2 

Fi'iZk > v+kjp) < Pr(|Zfe-/c/i^| >v)< 2 exp 

Now take /3 > 0 and v = i^g(^ 2 N/e) ~ Suppose that for 
some a G (0,1) 


fcp < 


ia-aW 

32 


(log 


2N\-3 


e / 


and kfj,^ < 


a/3 


then we obtain 


Pr 11$ 


l2> 


/? 


log(2iV/e; 


■^ < 2 exp ( — 


32/x^k 


log(27V/e)’ 

(15) 


(16) 


Now the first claim of Theorem l3.1 [ follows by the union bound 
with respect to the choice of the index j. 

The above proof contains the following statement. 

Corollary 3.4: Let $ be an m x iV matrix with mutual 
coherence fi and mean square coherence Let a G (0,1) 
and /3 > 0 be any constants. Suppose that for a < [3 log 2 e, 

(1 — a)^a^ 




< aa. 


3213k 


Then PR'^{Y!i^i >a)<2e 

Proof: Denote a = /3/(log(2A^/e)), then e/N = 2e“^/“. 
The claim is obtained by substituting a in (flST l- lfTbl l. ■ 


C. Proof of Theorem 12.71 

We are now ready to prove the main Theorem l2.ll The proof 
relies on several results from mi. The following theorem is 
a modification of Theorem 25 in that paper. Below R denotes 
a linear operator that performs a restriction to k coordinates 
chosen according to some rule (e.g., randomly). Its domain 
is determined by the context. Its adjoint R* acts on by 
padding the fc-vector with the appropriate number of zeros. 

Theorem 3.5: (Decoupling of the spectral norm) Let A be 
a 2N X 2N symmetric matrix with zero diagonal. Let p G 
{0,1}^^ be a random vector with N components equal to 
one. Define the index sets Ti{rj) = {i '. rji = 0},T2{r]) = {i : 
rji = 1}. Let 7? be a random restriction to k coordinates. For 
any g > 1 we have 

(E||7?Gli?*r)i/^ < 2 max 

Ki -\-K2 — fc 

(17) 

where 7l7'j^(^)xT2(r;) denotes the submatrix of A indexed by 
Ti ( 77 ) X T 2 {rj) and the matrices Ri are independent restrictions 
to ki coordinates from Ti,i = 1,2. 

When A has order (2N + 1) x (27V +1), then an analogous 
result holds for partitions into blocks of size TV and TV + 1 . 
Inequality (fTTl) appeared in the proof of the decoupling theo¬ 
rem, Theorem 9 in mi. The ideas behind it are due to mi. 

The next lemma is due to Tropp ll43l and Rudelson and 
Vershinin ll40l . 

Lemma 3.6: Suppose that ^4 is a matrix with TV columns 
and let 7? be a random restriction to k coordinates. Let q > 
2,p = m.ax{2,2log{rkAR*),q/2). Then 

(EWARTy/'^ < 3^iE\\AR*\\U2y/‘^ + ^\\A\\ 


where || . ||i _>.2 is the maximum column norm. 

The following lemma is a simple generalization of Proposition 
10 in mi. The only difference is that we allow the below 
to be a function of q instead of a constant. 

Lemma 3.7: Let q,X > 0 and let ^q be a positive function 
of q. Suppose that Z is a positive random variable whose gth 
moment satisfies the bound 

(EZ^)1/9<4v^+A. 

Then 


Proof: By the Markov inequality, 

f iqy/4 + 


= 


The main part of the proof is contained in the following 
lemma. 

Lemma 3.8: Let $ be an m x TV matrix with mutual 
coherence parameter /i. Suppose that for some 0 < ei, £2 < 1 

P^.({(7,*):||$J<(>,f >ei}|T)<e2. (18) 

Let 7? be a random restriction to k coordinates and 77 = 
— Id. For any q > 2,p = max(2,2 log(rk77777?*), q/2) 
we have 


(E||7?777?*r)i/« < + {ke2f/^pVk 

+ ,/^) + -\\^\f (19) 

Proof: We begin with setting the stage to apply Theorem 
13.51 Let 77 G {0,1}^ be a random vector with TV/2 ones and 
let 7?i, 7?2 be random restrictions to ki coordinates in the sets 
Ti{p),i = 1,2, respectively. Denote by supp(7?i), 7 = 1,2 the 
set of indices selected by Ri and let H{rj) := 777 .^ 7 ^) xTa/r;). 
Let q> 1 and let us bound the term E,,(E|| 7 ?i 77 ( 77 ) 7 ? 2 ||'^)^/'^ 
that appears on the right side of (fTTl i. The expectation in the 
g-norm is computed for two random restrictions 7?i and 7?2 
that are conditionally independent given 77 . Let E^ be the 
expectation with respect to Ri,i = 1,2. Given 77 we can 
evaluate these expectations in succession and apply Lemma 
[Tblto E 2 : 


E,(E||77i77(77)7?;r)i/^ = E JEi(E2||7?i77(77)7?;r)^/^ 


1/9 


< 


{Ei[3v/)i(E2||7?i77(77)7?;||?^2)'/^ 


+ y^|| 7 ?i 77 ( 77 )||]'}'^' 


< E^ 


{3v^[ 


Ei(E2||7?i77(77)7?*||f^2) 


+ 


2III- 

1 1/91 


1/9 


Ei||7?i77(77)r] 


where on the last line we used the Minkowski inequality (recall 
that the random variables involved are finite). Now use Lemma 
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13.61 again to obtain 


E,(E||i?iiJ(77)i?*r)'/'^<3v^E, 


E,E,\\R,H{^)R*\\l 


1/9 



Let us examine the three terms on the right-hand side of 
the last expression. Let ? 7 (i? 2 ) be the random vector con¬ 
ditional on the choice of ^2 coordinates. The sample space 
for p(i? 2 ) is formed of all the vectors 77 G { 0 , 1 }^ such 
that supp(i? 2 ) C 12 ( 77 ). In other words, this is a subset of 
the sample space { 0 , 1 }^ that is compatible with a given 
i? 2 - The random restriction Ri is still chosen out of Ti{r]) 
independently of R 2 . Denote by i? a random restriction to ki 
indices in the set (supp(i? 2 ))'^ and let E be the expectation 
computed with respect to it. We can write 

E,{E,E2\\RiH{r^)R;\\l^,y/‘> 

< {E^E,Ej\\R,H{tj)R*\\1^,)^/‘> 

= {E2E\\RHir,)R*\\U,y/'^. 


Recall that Hij = and that R and i ?2 are 0-1 

matrices. Using this in the last equation, we obtain 

E2E||i?iL(p)i?*||f^2 < max^^^ (E,6supp(it) My) ■ 

( 21 ) 

Now let us invoke assumption (fTSl l. Recalling that ki < k, we 
have 


where the last step uses the fact that the columns of $ have 
unit norm, and so > N/m > 1. 

Combining all the information accumulated up to this point 
in (EOl), we obtain 

E,(E||i?iiJ(77)i?2ir)'/^ 

< (fce2)'/Vv^ + \/2^) + . 

Finally, use this estimate in (fTTl i to obtain the claim of the 
lemma. ■ 

Proof of Theorem EH The strategy is to fix a triple 
a,b,c G ( 0 , 1 ) that satisfies (IHi and to prove that (|2l) implies 
(fc, (5, e)-StRIP. Let n = and 62 = k-^+^°s\ In 

Corollary 13.41 set a = ei and /3 = alog( 2 /e 2 ). Under the 
assumptions in (| 2 ]i this corollary implies that 

k 

mLj > ei) < ^2- 

m—1 

Invoking Lemma 13.81 we conclude that (fT^ holds with the 
current values of ei, € 2 - For any q > 41ogfc we have p = q/2, 
and thus (fT^ becomes 

(E||i?iJi?*||9)i/9 < 3v^(v^i + (fce 2 )^/Vv^ 

+ vW)+ 2 ^||<fif. ( 22 ) 

Introduce the following quantities: 

_ OL 

= SV2{.fei+{ke2)^^'^pVk + \/2k^) and A = —||$|p. 

Now (I 22 I 1 matches the assumption of Lemma 13.71 and we 
obtain 


Thus with probability 1 — k 2 e 2 the sum in (l2Tli is bounded 
above by ei. For the other instances we use the trivial bound 
kipf. We obtain 

3v^E,Ei(E2||pRii7(77)i?*ll?^2)'/'' 

< 3v^((l - k2e2)ei‘^ + k2e2{kiP^Y^^Y^'^ 
<3^{el^^ + k2e2{kip^y/^f/‘> 

< + {kt2f/‘'y^kipR), 

where in the last step we used the inequality + < (a- 1 - 6 )'* 

valid for all g > 1 and positive a, 6 . Let us turn to the second 
term on the right-hand side of (l20l i. We observe that 


\\H{v)*R*i\\i ^2 = max ||i7j,T2(,))II 2 

where Hj . denotes the jth row of H and is a 

restriction of the jth row to the indices in 12 ( 77 ). 

Finally, the third term in (l20l i can be bounded as follows; 


4fcifc2 

/V2 


E,,||i7(77)|| < 


(fci 


7V2 








Pr, iWRHRl > + A)) < e-«/^ (23) 

Choose q = 41og(l/e), which is consistent with our earlier 
assumptions on k,q, and e. With this, we obtain 

PR,{\\RHR*\\>e^/^{^q^+X))<e. 

Now observe that < 6 is precisely the RIP property 

for the support identified by the matrix R. Let us verify that 
the inequality 

6v^(v^i + {ke2)^^‘^\/kfp 

+ v^)v1^i(T7i) + -||4>f < 

is equivalent to (01). This is shown by substituting ci and €2 
with their definitions, and p and p.^ with their bounds in 
statement of the theorem. Thus, PR^(||i?i7i?*|| > 6 ) < e, 
which establishes the StRIP property of $. ■ 

IV. Examples and extensions 
A. Examples of sampling matrices. 

It is known EH that experimental performance of many 
known RIP sampling matrices in sparse recovery is far better 
than predicted by the theoretical estimates. Theorems 13. II and 
12.11 provide some insight into the reasons for such behavior. 
As an example, take binary matrices constructed from the 
Delsarte-Goethals codes mentioned previously. The sampling 




















7 


matrices $ obtained from them are coherence-invariant. If we 
take s to be an odd integer and set r = (s -f l)/2, then we 
obtain for this family of matrices the parameters 

m = N = p = 

As noted above, we have p? < 1/m and ||$|| = ^jN/m. 
Thus for p and p? to satisfy the assumptions in Theorems 
13.11 and 12.11 we need m, N, and k to satisfy the relation 
m = Q{k log^ y) which is nearly optimal for sparse-recovery. 
Note that to satisfy just the assumptions of Thm. EH we 
can construct a Delsarte-Goethals matrix with shorter column 
length of m = 0{k\ogk), see Section Hl-CI 

Similar logic leads to derivations of such relations for 
other matrices. We summarize these arguments in the next 
proposition, which shows that matrices with nearly optimal 
sketch length support high-probability recovery of sparse sig¬ 
nals chosen from the generic signal model (more on sparse 
recovery in the Appendix; see in particular Theorem lA.il ). 

Definition 4.1: We say that a signal x G is drawn from 
a generic random signal model Sk if 

1) The locations of the k coordinates of x with largest 
magnitudes are chosen among all fc-subsets / C [N] with 
a uniform distribution; 

2) Conditional on I, the signs of the coordinates Xi,i G / 
are i.i.d. uniform Bernoulli random variables taking values in 
the set {1, —1}. 

Proposition 4.1: Let $ be an to x W sampling matrix. 
Suppose that it has coherence parameters p. = 0(to“^/^), 
= 0(to“^), and 

\m = o{s/Njk)- 

If TO = 0(fc(log(A^/e))^) and fc < 1/e, then $ supports sparse 
recovery under Basis Pursuit for all but an e proportion of k- 
sparse signals chosen from the generic random signal model 
Sk. 

We remark that the conditions on mean square coherence are 
generally easy to achieve. As seen from Table [Jbelow, they are 
satisfied by most examples considered in the existing literature, 
including both random and deterministic constructions. The 
most problematic quantity is the mutual coherence parameter 
p. It might either be large itself, or have a large theoretical 
bound. Compared to earlier work, our results rely on a more 
relaxed condition on p, enabling us to establish near-optimality 
for new classes of matrices. For readers’ convenience, we 
summarize in Table 1 a list of such optimal matrices along 
with several of their useful properties. A systematic description 
of all but the last two classes of matrices can be found in 
a. Therefore we limit ourselves to giving definitions and 
performing some not immediately obvious calculations of the 
newly defined parameter, the mean square coherence. 

Normalized Gaussian Frames. A normalized Gaussian 
frame is obtained by normalizing each column of a Gaussian 
matrix with independent, Gaussian-distributed entries that have 
zero mean and unit variance. The mutual coherence and 
spectral norm of such matrices were characterized in H (see 
Table HJ). These results together with the relation p^ < p^ lead 
to a trivial upper bound on p^, namely p^ < IblogN/m. 


Since this bound is already tight enough for p^ to satisfy the 
assumption of Proposition 14.11 and to avoid distraction from 
the main goals of the paper, we made no attempt to refine it 
here. 


Random Harmonic Frames: Let T he an N x N discrete 
Fourier transform matrix, i.e., J-j^k = . Let ry,, 

i = 1,..., W, be a sequence of independent Bernoulli random 
variables with mean Set A4 = {i : rji = 1} and use to 
denote the submatrix of IF whose row indices lies in fiA. Then 
the random matrix \J j^J'M is called a random harmonic 
frame (201, CD. In the next proposition we compute the mean 
square coherence for all realizations of this matrix. 

Proposition 4.2: All instances of the random harmonic 
frames are coherence invariant with the following mean square 
coherence 

-2 N-\M\ 

^ - {N-1)\M\- 

Proof: For each t G [|Ad|], let a* with be the f-th member 
of Ad. To prove coherence invariance, we only need to show 
that {pj,k ■ k G [W]\j} = {pN,k : k € [W — 1]} holds for all 
j G [N]. This is true since 


, \M\ 

1 ^^ 2tTt{j — k)af 


PN,{k-j+N)mod N fof k j. 


In words, the fcth coherence in the set {pj^k,k G [iV]\j} is 
exactly the {k — j + N mod N)-th coherence in {pN,k, k G 
[N — 1]}, therefore the two sets are equal. We proceed to 
calculate the mean square coherence. 


= 


1 

7V(W- l)|Ad|2 
1 

N{N-1)\M\^ 

\M\ N 


N 

E 

j^k,j,k=l 


\Ml 

E' 

t=i 


,2Tri(j-k)at/N 


N \M\ 

E E 

N \M\ 

(EE' 


j^k,j,k—l — l 


E EE- 


27ri(j-k){at^ 


— l k—1 j^k 

. - 1)|A<| - \M\i\M \- l)iV) 

N -\M\ 


Chirp Matrices: Let to be a prime. An to x to^ “chirp 
matrix” $ is defined by ^t,am+b = 

t,a,b = 1,..., TO. The coherence between each pairs of column 
vectors is known to be 

P-jk = !— (j ^ k), 

s/m 

from which we immediately obtain the inequalities p < 1/s/m 
and < 1 /to. More details on these frames are given, e.g.. 















in m], ED. 

Equiangular tight frames (ETFs): A matrix is called 
an ETF if its columns {fi € satisfy the 

following two conditions: 

• ||</>»||2 = 1, for i = 

• = \/m{N-iy j- 

From this definition we obtain fj, = ^ ~ 

■ The entry in the table also covers the recent 
construction of ETFs from Steiner systems EH- 

Reed-Muller matrices: In Table U we list two tight frames 
obtained from binary codes. The Reed-Muller matrices are 
obtained from certain special subcodes of the second-order 
Reed-Muller codes ESI; their coherence parameter p is found 
in a and the mean square coherence is found from (fTOl i. The 
Delsarte-Goethals matrices are also based on some subcodes 
of the second order Reed-Muller codes and were discussed 
earlier in this section. Both dictionaries form unit-norm tight 
frames (the rows of the matrix <!> are pairwise orthogonal), 
with a consequence that ||$|| = ywV/m. We include these 
two examples out of many other possibilities based on codes 
because they appear in earlier works, and because their pa¬ 
rameters are in the range that fits well our conditions. 

We note that the quaternary version of these frames is also 
of interest in the context of sparse recovery; see in particular 

ini. 

Deterministic sub-Fourier Construction sm-- Fet p > 2 be 
a prime, and let f{x) € Fp[x] be a polynomial of degree 
d > 2 over the finite field Fp. Suppose that m is some integer 
satisfying < m < p. Then we can construct an m x p 

deterministic RIP matrix from a p x p DFT matrix by keeping 
only the rows with indices in {/(n) (mod p),n = 1 ,..., m}, 
and normalizing the columns of the resulting matrix. These 
submatrices form tight frames, and so their spectral norms can 
be easily verified to be yjpjm. It is known ED that this matrix 
has mutual coherence no greater than logf^) Even 

though this bound is an artifact of the proof technique used in 
ED, there seem to be no obvious ways of improving it. 
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Name 

R/C 

Dimensions 



Normalized Gaussian (G) 

R 

m X N 

VIS log V 
— \/m — y/12log N 


Random harmonic (RH) 

C 

\M\ xN, 4m< \M\ < fm 

^ / 118(iV-m) log iV 

— V rnN 

^ N-\M\ 

— |At|(]V-l) 

Chirp (C) 

C 

m X 

1 

y/rn 

1 

m-\-l 

ETF (including Steiner) 

C 

\/7V < m < N 

1 N — m 
y m{N — l) 


Reed-Muller (RM) 

R 

2® X 241+®) 


< 2“^ 

Delsarte-Goethals set (DG) 

R 

22s-\-2 ^ 22(s+l)(r’-|-2)—r 

2^ — 8 — 1 

< 2“2s-2 

Deterministic subFourier (SF) 

C 

m X p 

g3d^-l/(9d2 logd) 

< 4_ 

— m 


Probability Requirement for StRIP: m = O(-) 


Name 




Restrictions 


y/rn-\-\/N + y/2 log iV 
■\/m— y/Wrn log N' 

<\/^ 

— y m 

y/m 

fW 

m 
2ts/2 

2Cs+l)(r+l) —r/2 


G 

RH 

C 

ETF 

RM 

DG 

SF 


eOlogTV < m < 

16 log N < m < ^ 
m is prime 


/ M(N-l) 
N-M ’ 




t < s/4 
r < s/2 

p is prime, ^ ^ ^ P 
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A _ 1 

N JT- 
deterministic 


>1_± _ 

— N N'^ 


ai'e odd integers 


deterministic 

deterministic 

deterministic 

deterministic 


max{A:, \/k log k log N} 


max{A:, \/k log k log N} 
k 

k 

k 

k 

9d^ log d 

max{/c, (fc log k) 3 } 


TABLE I 

Examples for Theorem. I2.1I Classes of sampling matrices satisfying the StRIP. 
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Appendix 

Among the most studied estimators for sparse recovery is 
the Basis Pursuit algorithm fT2\ . This is an £i-minimization 
algorithm that provides an estimate of the signal through 
solving a convex programming problem 

X = argmin ||S||i subject to = y. (24) 

In this section we prove approximation error bounds for 
recovery by Basis Pursuit from linear sketches obtained using 
deterministic matrices with the StRIP and SINC properties. 

It was proved in ll44l that random sparse signals sampled 
using matrices with the StRIP property can be recovered with 


high probability from low-dimensional sketches using linear 
programming. Theorem lA.ll below generalizes this result to 
signals that are not necessarily sparse. Its proof essentially 
follows from ll20l with an extra calculation of the failure 
rate stemming from replacing the hard RIP condition with its 
statistical version. It is presented here for reader’s convenience. 

Theorem A.l: Suppose that a; is a generic random signal 
from the model Sk ■ Let y = ^x and let x be the approxima¬ 
tion of X by the Basis Pursuit algorithm. Let I be the set of 
k largest coordinates of x. If 

1) T> is (fc, (5, e)-StRIP; 

2) ‘J’is 

then with probability at least 1 — 3e 


\\XI - Xl\\2 < 


_ _ min. \\x _ 

2y2\og{2N/e) ^ -sparse 


1 


(25) 


and 

\\xjc — xjc\\i < 4: min ||a; —a;'||i (26) 

aj'is k -sparse 


This theorem implies that if the signal x itself is fc-sparse then 
the basis pursuit algorithm will recover it exactly. Otherwise, 
its output X will be a tight sparse approximation of x. Note 
that it is easy to join the estimates (|25]) and (1261) into a single 
inequality that gives an h/h error guarantee. 

Theorem lA.il will follow from the next three lemmas. Some 
of the ideas involved in their proofs are close to the techniques 
used in ll20l . Let h = x — x he the error in recovery of basis 
pursuit. In the following / C [A^] refers to the support of the 
k largest coordinates of x. 
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Lemma A.2: Let s = 81og(2A^/e). Suppose that 

||($f$7)-i||<T^and 

< s"^(l - 5)^ for all 7 G r ;= [TV] \ /. 

Then 

Proof: Clearly, = $a; — = 0, so = 

— ^ichjc and 

hi = — 

We obtain 

\\hih < E 

11/^7^ 111, 


as required. ■ 

Next we show that the error outside / cannot be large. 
Below sgn(it) is a ±l-vector of signs of the argument vector 
u. 

Lemma A. 3: Suppose that there exists a vector v G 
such that 

(i) V is contained in the row space of say v = 

(ii) Vi = sgn{xi); 

(iii) \\vic\\i^ <1/2. 

Then 

||fr7<=||i <4||ai7<=||i- (27) 

Proof: By (l24li we have 


||ic||i > ||*||i = ||ic + h\\i = W^i + hi\\i + \\xio + hia\\i 
> llairlli + {sgn{xi),hi) + \\hio\\i - \\xic\\i. 

Here we have used the inequality ||a + b||i > l|a||i + 
(sgn(a),6) valid for any two vectors a, 6 G and the 
triangle inequality. From this we obtain 


||fi7'=||i < |(sgn(a;7),b,7)| + 2||a;7c||i. 


Further, using the properties of u, we have 


\{sgn{xi),hi)\ = 


< 

< 

< 


\{vi,hi)\ 

\{v,h) - {vi.,hic)\ 
|(T>'^t(;,/r)| + |(u7c,b,7c)| 
\{w,^h)\ + \\vic\\ij\hic\\i 

l\\hi4i- 


The statement of the lemma is now evident. ■ 

Now we prove that such a vector v as defined in the last 
lemma indeed exists. 

Lemma A.4: Let tc be a generic random signal from the 
model Sk ■ Suppose that the support I of the k largest coordi¬ 
nates of X is fixed. Under the assumptions of Lemma lA.21 the 
vector 

V = <i)^<l) 7 (<I)f < 1 ) 7 )“^ sgn(a; 7 ) 


satisfies (i)-(iii) of Lemma lA.3l with probability at least 1 — e. 


Proof: From the definition of v it is clear that it belongs 
to the row-space of T* and Vi = sgn(a; 7 ). We have Vi = 
</>f^ 7 (^T^ 7 )"^sgn(a: 7 ) = (s,,sgn(a; 7 )), where 

G 

We will show that |ui| < \ for all i G with probability 
1 — e. 

Since the coordinates of sgn(a; 7 ) are i.i.d. uniform random 
variables taking values in the set {±1}, we can use Hoeffding’s 
inequality to claim that 

P 77 ^(|t;.|>l/ 2 )< 2 exp(-^). (28) 

On the other hand, for all i G 

11^.112 = mj^i)-^<^>j4h 

< ||($f$,)-l|l||cl>j</>,||2 

1 l-(5 

< - , 

“ 1-5 48log{2N/e) 


v'81og(2iV/e)' 

Equations (l28l l and ( |29] | together imply for any i G 

-P7t'=(|w»| > y) < 2 exp( g(i/^giog( 2 ]Y 7 i)) 2 ) “ N' 

Using the union bound, we now obtain the following relation; 

7"7t^(ll^7=||oo>l/2) <e. (30) 

Hence |ui| < 2 for all i G with probability at least 1 — e. 


Now we are ready to prove Theorem lA.il 

Proof of Theorem \A.1\ The matrix $ is (fc, 5, e)-SRIP. 
Hence, with probability at least 1 — e, ||($f$ 7 )“ ^11 < T^- 
At the same time, from the SINC assumption we have, with 
probability at least 1 — e over the choice of I, 

4-5? 


Ichfr 


< 


- 81og(2Af/e)’ 
for all i G Thus, $7 will have these two properties with 
probability at least 1 — 2e. Then from Lemma lA.21 we obtain 
that 


\\hi\\2< 


1 


\hic 


48\og{2N/e) 

with probability > 1 — 2e. Furthermore, from Lemmas IA.3I 

lAa 

\\hic\\i < 4||a;7c||i, 


with probability 1 — e. This completes the proof. 
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